[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync #10562

QiangCai · 2016-01-03T13:56:15Z

I have closed pull request #10487. And I create this pull request to resolve the problem.

spark jira
https://issues.apache.org/jira/browse/SPARK-12340

SparkQA · 2016-01-03T19:41:08Z

Test build #2307 has finished for PR 10562 at commit 4974f05.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2016-01-03T22:18:57Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

I don't think a blank line is needed here.

QiangCai · 2016-01-04T01:56:25Z

I have removed some blank lines.

srowen · 2016-01-04T10:32:49Z

@QiangCai the problem isn't blank lines but whitespace at the end of your lines.

QiangCai · 2016-01-04T13:15:31Z

@srowen I have removed some whitespaces.

SparkQA · 2016-01-04T15:03:51Z

Test build #2310 has finished for PR 10562 at commit 639cfb2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

QiangCai · 2016-01-05T01:29:40Z

@srowen I have no idea how to resolve this error of unit tests. Would you help me?

srowen · 2016-01-05T10:37:23Z

@QiangCai I think the test failures are unrelated. However before we can retest you'll have to rebase as there is a merge conflict now.

…take and AsyncRDDActions.takeAsync

QiangCai · 2016-01-05T13:32:13Z

@srowen I have rebased from master and resolved all conflicts.

srowen · 2016-01-05T13:32:46Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

This shouldn't be here

I have removed it.

SparkQA · 2016-01-05T15:21:21Z

Test build #2325 has finished for PR 10562 at commit 3d340f7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

QiangCai · 2016-01-05T16:43:28Z

@srowen I have found some error messages in test build log, a OutOfMemoryError exception has happened. The code in 71 line of the file AsyncRDDActions.scala is "val results = new ArrayBufferT ", because the param num(2147483638) is too large, so JVM can't allocate enough memory space.

error messages:
[info] Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.SQLQuerySuite *** ABORTED *** (31 seconds, 447 milliseconds)
[info] java.lang.OutOfMemoryError: Java heap space
[info] at scala.collection.mutable.ResizableArray$class.$init$(ResizableArray.scala:32)
[info] at scala.collection.mutable.ArrayBuffer.(ArrayBuffer.scala:47)
[info] at org.apache.spark.rdd.AsyncRDDActions$$anonfun$takeAsync$1.apply(AsyncRDDActions.scala:71)
[info] at org.apache.spark.rdd.AsyncRDDActions$$anonfun$takeAsync$1.apply(AsyncRDDActions.scala:66)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
[info] at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
[info] at org.apache.spark.rdd.AsyncRDDActions.takeAsync(AsyncRDDActions.scala:66)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply$mcV$sp(SQLQuerySuite.scala:2079)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply(SQLQuerySuite.scala:2071)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply(SQLQuerySuite.scala:2071)
......

sarutak · 2016-01-05T17:00:46Z

Why the instance of ArrayBuffer in AsyncRDDActions#takeAsync is created with initial size?
On the other hand, the instance of ArrayBuffer in RDD#take is created without initial size.

sarutak · 2016-01-05T17:02:00Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

If you have a chance to modify again, please insert a white space between ) and {.

I will do it.

QiangCai · 2016-01-05T17:06:27Z

@sarutak Maybe we have found another bug. I will try to fix it.

QiangCai · 2016-01-05T17:26:22Z

I have removed the initial size num. The initial size will be a default value 16. It is the same to RDD#take and will be okey.

SparkQA · 2016-01-05T19:37:18Z

Test build #2326 has finished for PR 10562 at commit e7577ee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

QiangCai · 2016-01-06T01:35:41Z

I think I have resolved this problem.

srowen · 2016-01-06T09:01:42Z

LGTM

sarutak · 2016-01-06T09:11:37Z

Merging this into master and branch-1.6. Thanks @QiangCai !

sarutak · 2016-01-06T09:20:58Z

@QiangCai We have many conflicts against branch-1.6 so I'd merge this into only master for now.
If you want merge this into branch-1.6 please feel free open another PR.

QiangCai · 2016-01-06T16:02:12Z

OK. I have created another PR #10619 to merge this code to branch-1.6.

rxin · 2016-01-06T18:16:45Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

why is this change necessary? When can partsScanned go above 2B?

Ah, you're right. partScanned cannot exceed the value of totalParts.
I'll return it to Int.

I think there is a legit problem here. Imagine totalParts is close to Int.MaxValue, and imagine partsScanned is close to totalParts. Adding p.size to it below could cause it to roll over. I think this change is needed.

That's never possible -- if we have anywhere near 2B partitions, the scheduler won't be fast enough to schedule them. As a matter of fact, if we have anywhere larger than a few millions, the scheduler will likely crash.

Fair point, in practice this all but certainly won't happen. Note that this patch was already committed to master making this a Long. It doesn't hurt and is very very theoretically more correct locally. I suppose I don't think it's worth updating again, but I do not feel strongly about it.

I'd prefer to change it back since it is so little work, so this does not start a trend to change all ints to longs for no reason. Note that this also raise questions about why this value can be greater than int.max when somebody reads this code in the future.

Also @srowen even if totalParts is close to int.max, I don't think partsScanned can be greater than int.max because we never scan more parts than the number of parts available.

Ah ok you were referring to partsScanned + numPartsToTry - we should just cast that to long to minimize the impact.

rxin · 2016-01-06T19:28:39Z

@QiangCai it would be great if you can submit a new pull request to address the comments. Thanks.

This is a follow-up for the original patch apache#10562.

This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.

QiangCai mentioned this pull request Jan 3, 2016

[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync #10487

Closed

sarutak reviewed Jan 3, 2016
View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Outdated

Copy link

Member

sarutak Jan 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a blank line is needed here.

QiangCai closed this Jan 4, 2016

QiangCai reopened this Jan 4, 2016

[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.…

151a5ef

…take and AsyncRDDActions.takeAsync

srowen reviewed Jan 5, 2016
View reviewed changes

merge conflict

3d340f7

modify param to be small enough

ca18565

sarutak reviewed Jan 5, 2016
View reviewed changes

fix initial of result in AsyncRDDActions.takeAsync

e7577ee

asfgit closed this in 5d871ea Jan 6, 2016

QiangCai deleted the bugfix branch January 6, 2016 14:09

QiangCai mentioned this pull request Jan 6, 2016

[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync for branch-1.6 #10619

Closed

rxin reviewed Jan 6, 2016
View reviewed changes

rxin added a commit to rxin/spark that referenced this pull request Jan 9, 2016

[SPARK-12340] Fix overflow in various take functions.

470d987

This is a follow-up for the original patch apache#10562.

rxin mentioned this pull request Jan 9, 2016

[SPARK-12340] Fix overflow in various take functions. #10670

Closed

asfgit pushed a commit that referenced this pull request Jan 9, 2016

[SPARK-12340] Fix overflow in various take functions.

b23c452

This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.

[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync #10562

[SPARK-12340][SQL]fix Int overflow in the SparkPlan.executeTake, RDD.take and AsyncRDDActions.takeAsync #10562

Uh oh!

Conversation

QiangCai commented Jan 3, 2016

Uh oh!

SparkQA commented Jan 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiangCai commented Jan 4, 2016

Uh oh!

srowen commented Jan 4, 2016

Uh oh!

QiangCai commented Jan 4, 2016

Uh oh!

SparkQA commented Jan 4, 2016

Uh oh!

QiangCai commented Jan 5, 2016

Uh oh!

srowen commented Jan 5, 2016

Uh oh!

QiangCai commented Jan 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 5, 2016

Uh oh!

QiangCai commented Jan 5, 2016

Uh oh!

sarutak commented Jan 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiangCai commented Jan 5, 2016

Uh oh!

QiangCai commented Jan 5, 2016

Uh oh!

SparkQA commented Jan 5, 2016

Uh oh!

QiangCai commented Jan 6, 2016

Uh oh!

srowen commented Jan 6, 2016

Uh oh!

sarutak commented Jan 6, 2016

Uh oh!

sarutak commented Jan 6, 2016

Uh oh!

QiangCai commented Jan 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Jan 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants